Caio Raphael

Maps characters (text) to byte sequences and vice versa. Used for text fields in serialization.

When someone says "custom package encoding" , they usually mean:
- A framing protocol (how message start/end is delimited).
- A custom serialization/deserialization strategy.
- A binary or textual format for transmitting structures over the network.
Using "encoding" for package framing strategies is technically valid but potentially ambiguous.
In networking, it’s better to use more specific terms.
The word "encoding" itself isn’t wrong but should be interpreted in the technical context.
In Odin, JSON and CBOR are considered "encoding" .

Unicode Transformation Format – 8-bit
Size :
- ASCII characters (0–127) use 1 byte
- Non-ASCII characters use up to 4 bytes
- For languages with many non-ASCII characters (e.g., Chinese, Japanese), it can take more space than UTF-16
Web standard (used by HTML, JSON, XML, etc.)
Backward compatible with ASCII; valid ASCII text is valid UTF-8
Serialization:
- UTF-8 can be considered a form of serialization, specifically for binary text serialization

Size :
- BMP characters (Basic Multilingual Plane, U+0000 to U+FFFF) use 2 bytes
- Characters outside BMP (e.g., emojis, historical scripts) use 4 bytes (surrogate pairs)
- More efficient for languages with many BMP characters (e.g., many Asian languages)
Widely used in some APIs and programming languages (e.g., Java, Windows, .NET)

American Standard Code for Information Interchange
Legacy system compatibility : For old systems or devices that only support ASCII
Simple English text : When text contains only basic characters (A–Z letters, 0–9 digits, basic punctuation)
Simplicity : ASCII uses exactly 1 byte (8 bits) per character, simplifying processing in very basic systems